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Abstract 

We consider two elementary forms of string rewriting called guided in- 
sertion/deletion and guided rewriting. The original strings are modified 
depending on the match with a given set of auxiliary strings, called 
guides. Guided insertion/deletion considers matching of a string and 
a guide with respect to a specific correspondence of strings. Guided 
rewriting considers matching of a string and a guide with respect to 
an equivalence relation on the alphabet. Guided insertion/deletion is 
inspired by RNA-editing, a biological process by which the original ge- 
netic information stored in DNA is modified before its final expression. 
The formalism here allows for simultaneous insertion and deletion of 
string elements. Guided rewriting, based on a letter-to-letter relation, 
is technically more appealing than guided insertion/deletion. We prove 
that guided rewriting preserves regularity: for every regular language 
its closure under guided rewriting is regular too. In the proof we 
will rely on the auxiliary notion of a slice sequence. We establish a 
correspondence of slice sequences and guide rewrite sequences. Because 
of their left-to-right nature, slice sequences are more convenient to deal 
with than guided rewrite sequences in the construction of the finite 
automata that we encounter in the proofs of regularity. Based on the 
result for guided rewriting we establish that guided insertion/deletion 
preserves regularity as well. 
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1 Introduction 


RNA editing is a biological mechanism that modifies the original “text” of 
the genetic information of a living organism after it is copied (transcribed) 
from the DNA. In this paper, we investigate two elementary formalisms of 
string transformation which are inspired by RNA editing. We consider guided 
insertion/deletion, which is close to an editing mechanism as encountered 
in the living cell, and guided rewriting, based on an adjustment relation, 
which lends itself more easily to formal analysis. In both forms of string 
rewriting a substring of the original string is adapted when it matches a 
string from a specific set, called the set of guides. The set G of guides is 
fixed and finite. In guided insertion/deletion the guide and the part of the 
string that is rewritten are not required to be of the same length, but they 
need to be equal up to occurrences of a distinguished dummy symbol. In 
guided rewriting the guide and substring are equivalent symbol-by-symbol 
according to the adjustment relation, a chosen and fixed equivalence relation. 

Both flavors of rewriting preserve the finiteness of the initial set of 
strings. Assuming a finite set of guides G, in both cases only a finite set of 
strings can be obtained by repeatedly rewriting a given string. In this work 
we show that also regularity of the initial string set is preserved for both 
cases. Starting from a language L, we consider the extension L;/q of the 
language with all the rewrites obtained by guided insertion/deletion and the 
extension Lg of the language obtained by adding all the adjustment-based 
guided rewrites. The main results of the paper state that regularity of L 
implies regularity of Lj/q and regularity of La. 

The motivation of this work is rooted in one of the basic processes of 
life which concerns the flow of genetic information. Initially, the original 
information stored in DNA molecules is faithfully copied to RNA by the 
process of transcription. In eukaryote cells, i.e., cells that have a nucleus, 
the RNA which is finally translated to proteins, does not carry an exact 
copy of the original information stored in the DNA part. Instead, the RNA 
string, which transmits the genetic information further on the chain, is a 
modification obtained by post-processing. On an abstract level an RNA 
molecule can be regarded as string over the alphabet {C,G,A,U}. The 
modification consists of insertion and deletion of these letters, also called 
nucleotides, on multiple locations in the original string. The class of the 
underlying adaptation mechanisms is collectively called RNA-editing. 

The computational power of insertion-deletion systems for RNA-editing 
is studied in [20]: After abstracting away the biological details, an insertion 


RNA-Editing with Combined Insertion and 
Deletion Preserves Regularity Al 


step is the replacement of a string uv by the string uav taken from a 
particular finite set of triples u,a,v. Similarly, a deletion step replaces 
uav by uv for another finite set of triples u,a,v. In [14] the restriction 
is considered where u and v are both empty. This mechanism claims full 
computational power, that is, all recursively enumerable languages can be 
generated in this way. 


Inspired by DNA recombination, Head proposes in [9] the notion of 
splicing. The DNA molecules (strings) are modified by so-called splicing 
rules. Each splicing rule is a tuple r = (uw, v1; wa, v2). Given two words w, = 
Z1U1V1y1 and wg = Lougvey2 the rule r produces the word w = 2 1ujve242. 
So, the word wy is split in between u; and v,, the word we in between ug 
and vg and the resulting subwords x;u1 and vay2 are recombined into the 
word w. For splicing a closure result, reminiscent to the one for guided 
insertion/deletion and guided rewriting considered in this paper, has been 
established. Casted in our terminology, if L is a regular language and S isa 
finite set of splicing rules, then Lg is regular too, cf. [12, 15]. Here, Lg is 
the least language containing L and closed under the splicing rules of S. 


Compared to the above described formal systems, natural RNA-editing 
mechanisms are very often quite limited. In most of the natural RNA-editing 
instances only the symbol U is inserted and deleted, instead of arbitrary 
strings a, see e.g. [1]. Motivated by this observation, we investigate guided 
insertion/deletion focusing on the special role of a distinguished symbol 0, a 
formal analog of the RNA letter U. A similar scheme, but which prohibited 
simultaneous insertion and deletion of the special symbol, we considered 
n [21]. To prove that under the present scheme regularity is preserved we 
need the stepping stone of guided rewriting based on an abstract adjustment 
relation. In particular, we prove the regularity preservation theorem for 
guided insertion/deletion by using the analogous result for guided rewriting 
based on adjustment. 


The regularity result for the adjustment-based rewriting is proved 
by constructing a finite automaton that accepts the language Lg. The 
construction procedure takes as input the set of guides G and a given finite 
automaton accepting the language L. A crucial point in the proof is the 
translation of the guided rewrite sequences into so-called slice sequences. 
The point is that, since guides may overlap, each guided rewrite step adds 
a ‘layer’ on top of the previous string. In this sense guided rewriting is 
vertically oriented. E.g., Figure 2 in Section 5 shows six rewrite steps of the 
string ebcfa yielding the string fbcfb involving eight layers in total. However, 
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in reasoning about recognition by a finite automaton a horizontal orientation 
is more natural. One would like to sweep from left to right, so to speak. 
Again referring to Figure 2, five slices can be distinguished, viz. a slice for 
each symbol of the string ebcfa. The technical machinery developed in this 
paper allows for a transition between the two orientations. 

In order to obtain a regularity result for guided insertion/deletion 
we apply a string transformation: for the language L and finite set of 
guides G over the alphabet © U {0} let N be a bound on the number of 
consecutive 0’s in G. We adapt the alphabet =U {0} to NUO by introducing 
N+1 new symbols representing strings of 0’s up to length N and a new 
symbol representing all strings of 0’s larger than N. The transformation we 
consider replaces in a string u over the alphabet ¥ U {0} all its maximal 
substrings of 0’s by the corresponding symbol of 0, obtaining a string @ over 
the alphabet © U O. In this way we obtain the transformed language L and 
guide set G over XU @. We establish that the closure of a language L over 
“ U {0} under guided insertion/deletion with respect to the set of guides G 
is regular iff L under guided rewriting with respect to G is regular. 

Paper layout. Section 2 provides the biological background of RNA- 
editing. The theorem on the preservation of regularity for guided insertion- 
deletion is presented in Section 3. The notion of guided rewriting based on 
an adjustment relation is introduced in Section 4; a corresponding theorem 
on the preservation of regularity for guided rewriting is formulated here too. 
To pave the way for the proof of the latter theorem, Section 5 introduces 
the notions of a rewrite sequence and of a slice sequence and establishes 
their relationship. Rewrite sequences record the subsequent guided rewrites 
that take place, slice sequences represent the cumulative effect of all rewrites 
at a particular position of the string being adjusted. Section 6 describes a 
construction of a finite automaton accepting the extended language Le for 
a fixed set of guides G and a finite automaton accepting the language L. 
In Section 7 the proof is given that regularity of L implies the regularity 
of Lj/q. Section 8 wraps up with related work and concluding remarks. 


2 Biological Motivation 


In this section we briefly describe the biological aspects of the RNA-editing 
mechanisms and provide the corresponding abstractions. 

In the living cell there are different kinds of RNA editing that vary in 
the type of edited RNA and the set of editing operations. In this paper we 
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focus on an editing which is quite extensively studied from a biological point 
of view and which involves simultaneous insertion and deletion of uracil in 
messenger RNA (mRNA) [3]. (Some other types of RNA editing involve also 
letter substitution, cf. e.g. [17].) Uracil is represented by the letter U. The 
three other types of nucleotides for RNA, viz. adenine, guanine and cytosine, 
are represented by the letters A, G and C, respectively. 

The type of U-insertion/deletion editing we are dealing with occurs 
in the mitochondrial genes of kinetoplastid protozoa [19]. Kinetoplastids 
are single cell organisms that include parasites like Trypanosoma brucei 
and Crithidia fasciculata and that can cause serious diseases in humans 
and/or animals. Although the mitochondrial genes contain a relatively small 
amount of information, they are of utmost importance for the organism as a 
whole [5]. Apart from being interesting from a fundamental point of view, 
understanding of the RNA-editing mechanisms can be crucial in developing 
medicines for the corresponding diseases. 

Modifications of kinetoplastid mRNA are usually made within the 
coding regions. These are the parts that are translated into proteins, which 
are the building blocks of the cells. The coded information of the original 
gene can be altered and therefore expressed, i.e. translated into proteins, in 
a varying number of ways, depending on the environment in the cell. This 
provides additional flexibility as well as potential specialization of different 
parts of the organisms for particular functions. 


In the sequel we describe an idealized version of the mechanism for the 
insertion and deletion of U. More details can be found, for instance, in [19, 
1, 6, 18]. For simplicity we assume that only identical letters match with 
one another. In reality, the matching is based on complementarity, usually 
assuming the so-called Crick-Watson pairs: A matches with U and G matches 
with C. 

A single step in the mRNA editing involves two strands of RNA, a 
strand of (pre-edited) mRNA and a strand of guide RNA (gRNA), the latter 
typically referred to as the guide. We explain the mechanism for the insertion 
of uracil on the example given in Figure 1. We consider the mRNA fragment 
u = N,NoN3N4Ns5 and the guide g = NoaN3UUU Ng, where N; can be an 
arbitrary nucleotide A, G or C, but not U. Obviously, there is some match 
between u and g involving the letters No, N3, and N4, which is partially 
‘spoiled’ by the UUU sequence. Guide g attaches to u at positions where 
the letters match. The matching substrings N2N3 and N,4 serve as anchors 
(Fig. la). 


44 E.P. de Vink, H. Zantema, D. Bosnaéki 


No Nz __U l L Ns No Nz L L L N4 
—— —_— 
N, No Ng N, N N, WN Nz UL i l N, N 


Figure 1: Various stages of guided U-insertion 


By means of enzyme machinery, i.e., a special complex of proteins- 
enzymes called editosome [2], u is split open between N3 and Ny, (Fig. 1b 
and 1c). Then the editosome fills the gap between the anchors using the 
guide as a template. (Actually, different enzymes of the editosome complex 
are responsible for cutting the mRNA strand at the first mismatch position 
and adding the Us, however here we can safely disregard these details.) For 
each letter U in the guide the editosome adds a U in the gap. As a result 
the mRNA string u is transformed into Nj N2N3UUU N,N; (Fig. 1d). In 
general, one can have more than two anchors (involving only non-U letters) 
in which the guide and the mRNA strand match. In that case the mRNA is 
opened between each pair of anchors and all gaps between these anchors are 
filled with U such that the number of Us in the guide is matched. 


The deletion of Us from a strand of mRNA is implemented by a sym- 
metrical biochemical mechanism. We illustrate the deletion process too 
on an example. Assume the mRNA strand u = N; NoN3UU N4Ns and the 
guide g = NoN3N4. Like in the insertion case, g initiates the editing by 
attaching itself to u at the matching positions N2,N3, and Ny. Only now 
the enzymatic complex removes the mismatching UU substring between N3 
and N, to ensure a perfect match between the substring and the guide. As 
a result the edited string N;N2N3N.4Ns5 is obtained. In general, we can have 
several anchoring positions on the same string. In that case, all Us between 
each two matching positions are removed from the mRNA. 

Simultaneous insertions and deletions of U are also possible. 


For instance the guide NoN3UUUN, can induce parallel editing 
of the string N,UN,.UN3UN,4UN5U Ng which results in the string 
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N\UN2N3UUU N4U N5U No, where the U between Ng and N3 has been 
deleted and two U’s between N3 and Ny, have been inserted. This is done by 
the same biochemical mechanisms that are involved in separate insertions 
and deletions. Like in the other cases described above, we can have multiple 
insertions and deletions induced by the same guide on the original pre-edited 
sequence. 

Abstracting from the biochemical details, for all three cases considered 
above it is common that a strand u = xyz, such that y equals g up to 
occurrences of U, is modified by the insertion and deletion mechanism and 
becomes a string v = xgz. The rewriting system that we describe in the 
sequel also applies to another case with the same effect. For example, consider 
a guide g = NoN3UUU Nj, and a pre-edited mRNA u = Ny NoN3UU N4N5.Neo. 
Now, to obtain the match of the guide g and a substring y of u, a U is inserted 
in u, resulting in the string v = N; NoN3UUU N4N35 Ng. If the U subsequence 
in y was longer though, like in the case for u’ = Ny NoN3UUU N4N;Ne and 
g' = NoN3UU Ng, then we have that the extra U in u’ is removed resulting 
in vu! = Ni NoN3UU N4Ns Ne. 

For our purposes, the mRNA editing mechanism underlying U-insertion 
and deletion boils down to symbolic manipulations of strings. The common 
denominator of the above described editing mechanisms is that in a single 
step some substring y is replaced by a guide g for which y and g match 
modulo occurrences of the symbol U. In the rest of the paper the analog of 
the nucleotide U will be denoted by 0. 


3 Guided Insertion / Deletion 


Inspired by the biological scheme of editing of mRNA as discussed in the 
previous section, we study more abstract notions of guided insertion and 
deletion and guided rewriting based on an adjustment relation in the remain- 
der of this paper. In this section we address guided insertion and deletion, 
turning to guided rewriting in Section 4. 

More precisely, fix an alphabet © and distinguish 0 ¢ ©. Put No = 
“U {0}. Choose a finite set G C U5, with elements g also referred to as 
guides. Reflecting the biological mechanism, we assume that each g € G is 
not equal to the empty string ¢ and that the first and last letter of each 
g € Gis not equal to 0. Hence, G C NU X-h9-y, or, more particularly, 
G C &-(0*-2)*. Now a guided insertion/deletion step =;/q with respect 
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to G is given by 


U > j/aV u=aryzAv=2gz\geEGArn(y)=7(g) 


where y € U-bG-h, and (y) and m(g) are obtained from y and g, respectively, 
by removing their 0s. Thus, 7 : & — &* is the homomorphism such that 
m(€) =e, m(0) = € and x(a) =a for a € &. So, intuitively, g is anchored on 
the substring y of u and sequences of Os are adjusted as prescribed by the 
guide g, in effect replacing the substring y by the guide g while maintaining 
the prefix x and suffix z. 

As a simple example of a single guided insertion/deletion step, for G = 
{g} with g = beb000ab0c and u = a00bc00babcc00a00b, we have u = j/q Vv 
for v = a00bcb000ab0cc00a00b. Here we have u = a00 - bcO00babc - cO0a008, 
m(bcO00babc) = bebabc = 7 (bcb000ab0c) and v = a00 - bcbO000ab0c - cO0a00b. 
Note, for the string v, being the result of a rewrite with guide g itself with 
only one possible anchoring, only trivial steps can be taken further. So, 
the operation of guided insertion/deletion with the same guide g at the 
same position in a string is idempotent. However, anchoring may overlap. 
Consider the set of guides G = { aa0a, a0aa }, for example. Then the string 
aaa yields an infinite rewrite sequence 


aaa = j/q aa0a = j/q Aaa =; /q aa0a => j/q aaa --- 


Still, from aaa only finitely many different rewrites can be obtained by 
insertion/deletion steps guided by this G, viz. { aaa, aa0a, a0aa }. 

The restrictions put on G exclude arbitrary deletions (possible if ¢ would 
be allowed as guide) and infinite pumping (if guides need not be delimited by 
symbols from }). As an illustration of the latter case, starting from the string 
abc and ‘guide’ Oab, the infinite sequence abc = j/q abe = j/q 00abc > i74 
000abc... would be obtained. The restriction on the substring y prevents to 
make changes outside the scope of the guide g and forbids a0b000c =>;/q ab0c 
by way of the guide ab. 


As a first observation we show that the set Lij, = {v € 3X5 | u 7d 
v }, for any finite set of guides G and any string u, is finite. Write u = 
ag0"a1 ...@n—10°"an where a, € &, k = 0,...,n, and ip > 0, k = 1,...,n, 
for some n > 0. In effect, a guided insertion/deletion step only modifies 
the substrings 0% or leaves them as is. Therefore, after one or more guided 
insertion/deletion steps the substrings 0’* are strings taken from the set 


Ma = {0% |1<k<n}U{0'|2a-0%2 EG, L,z2EDG, 4,bE€b, £20} 
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Thus, if wu id v then v € ity =f tpt s< 4 2A, | eke Zig 1< 
k<nh}, ie. Diva Cc ee Since the set of guides G is finite, it follows that 
Zid is finite, that LM 4 is finite and that Lig is finite as well. 

More generally, given a set of guides G, we define the extension by 
insertion/deletion L,/q of a language L over Xo by putting Lijg = { v € Xp | 
duce L:u =a v }. Casted to the biological setting of Section 2, L are the 
strands of messenger RNA, G are strands of guide RNA. Next, we consider 
the question whether regularity of the language LF is inherited by the induced 
language L;/g. Note, despite the finiteness of the insertion/deletion scheme 
for a single string, it is not obvious that such a statement would hold. 

With the machinery of rewrite sequences and slice sequences developed 
in the sequel of the paper, we will be able to prove the following for guided 
insertion /deletion. 


Theorem 1. /f L is a regular language, then the language Lj/q is regular 
too. 


We will prove Theorem 1 by applying a more general result on guided 
rewriting, viz. Theorem 2 formulated in the next section and ultimately 
proven in Section 6. As in the notion of guided rewriting as developed in 
the sequel, symbols are only replaced by single symbols by which lengths 
of strings are always preserved, a transformation is required to be able to 
apply Theorem 2. 

Before moving to guided rewriting we relate our results to those of [21]. 
There a relation similar to = ;/q was introduced, with the only difference 
that in a single step either 0’s are deleted or inserted, but not both at the 
some time. The consequence of this small difference is significant: the main 
conclusion of [21] is that in that setting regularity is not preserved, which is 
the opposite of Theorem 1 in the present setting. 


4 Guided Rewriting 


The idea of guided rewriting is that a symbol is replaced by an equivalent 
symbol, equivalence taken with respect to some adjustment relation ~. 
The resulting one-one correspondence of the symbols of the string u and its 
guided rewrite v, enjoyed by this notion of reduction, will turn out technically 
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convenient in the sequel. Intuitively, the equivalent symbols abstract from 
sequences of 0’s. 

Let © be a finite alphabet and ~ an equivalence relation on \, called 
the adjustment relation. If a ~ b we say that a can be adjusted to b. For a 
string u € &* we write #u for its length, use u[i] to denote its i-th element, 
i=1,...,#u, and let u[p, q] stand for the substring u[p] ulp+1]---ulg]. The 
relation ~ is lifted to &* by putting 


urv if #u=#v A Vi=1,...,#u: uli] ~ vi] 


Next we define the notion of guided rewriting that involves an adjustment 
relation. 


Definition 1. We fiz a finite subset G C d*, called the set of guides. 


(a) For u,v € d*, 9g € G, p> 0, we define ug» v, stating that v is the 
rewrite of u with guide g at position p, by 


iSog y Atg. 2 ek eS ry eee Hp AAG AV Tg2 


(b) We write u=> v ifu> gp v for someg eG andp>0. We use >* to 
denote the reflexive transitive closure of =. A sequence uy, > u2 > 
+++ => un ts called a reduction. 


(c) For a language L over © and a set of guides G we write 


Ig={ved* |queL:us*v} 


So, a =-step adjusts a substring to a guide in G element-wise, and La 
consists of all strings that can be obtained from a string from DL by any 
number of such adjustments. Clearly, if u => v then also u~ v. 

As an example, if © = {a,b,c}, G = {bb} and a~ b but not a~c, then 
by a =>-step two consecutive symbols not equal to c are replaced by two 
consecutive b’s. In particular, aaacaa —»,; abbcaa and abbcaa —pp,9 bbbcaa. 
We have 


{aaacaasg = {aaacaa, bbacaa, abbcaa, 
aaacbb, bbbcaa, abbcbb, bbacbb, bbbcbb } 


Next, we state the main result of this paper regarding guided rewriting. 
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Theorem 2. Let an equivalence relation ~ on % and a finite set of guides G 
be given. Suppose L is a regular language. Then Le is regular too. 


Before going to the proof, we first show that both finiteness of G and the 
requirement of ~ being an equivalence relation are essential. 

To see that finiteness of G is essential for Theorem 2 to hold, let 
G={cakcb'c| k >1} and L = L(ca*ca*c). Let ~ satisfy a~ b but not 
a~c. Then all elements of LZ on which an adjustment is applicable are of 
the shape ca*ca*c, where the result of the adjustment is ca*cb*c, which can 
not be changed by any further adjustment because of the presence of b. So 


Le Nn L(ca*cb*c) = {ca®ceb*c| k > 1} 


is not regular. Since regularity is closed under intersection we conclude 
that Lg cannot be regular itself. However, note that in this example the 
set of guides is not finite, but not regular either. (We revisit this issue in 
Section 8.) 

Also equivalence properties of ~ are essential for Theorem 2. For 
G = {ab} and ~ = { (a,b), (b,a) } the only possible >-steps are replacing 
the pattern ba by ab. Note that here ~ is neither reflexive nor transitive. 
Since ba may be replaced by ab, bubble sort on a’s and b’s can be mimicked 
by =*, while on the other hand =* preserves both the number of a’s and 
the number of b’s. Hence 


L((ab)*)g MN L(a*b*) = {a*d* |k>0} 


which proves that £((ab)*)g is not regular, again since regularity is closed 
under intersection. 


5 Rewrite Sequences and Slice Sequences 


This section introduces an auxiliary notion, viz. the notion of a slice sequence, 
that can be considered as a ‘vertical’ version of the ‘horizontal’ notion of a 
rewrite sequence. We will establish a correspondence between these notions, 
which provides the basis of our proof of Theorem 2 in Section 6. 


Fix an alphabet J, an adjustment relation ~, and a set of guides G. 


Definition 2. A sequence 0 = (9x, Pr),—1 Of guide-position pairs is called 
a guided rewrite sequence for a string u € &* if it holds that (i) gp € G, 
(1) 0 < De < #u—F#g9x, and (iit) ulp_t+1, pet+#oGn] ~ Ge, for allk =1,...,r. 
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A guide-position pair (g, p) indicates an intended guided rewrite with g of the 
string u at position p. For the rewrite to fit we must have p+ #g < #u. The 
first p symbols of u, i.e. the substring u[1, p], are not affected by the rewrite, 
as are the last #u —p++#g symbols of u, i.e. the substring ulp+#g+1, #ul. 

The sequence @ induces a sequence of strings (uz);_ by putting uo = 
u and uz such that up_1 +9,,p, Ur for k = 1,...,r. To conclude that 
Uk—1 >gx,p, Uk is indeed a proper guided rewrite step, in particular that we 
have ug—i[pe+1, pet+#gxr] ~ ge, we use the assumption u[p,+1, pe+# 9x] ~ gk 
and the fact that if u >,» v then ulp+1,p+ #9] ~ vlp+1,p+ #9] 
and u~v. Therefore, by induction u =* ug_; and ulp_+1, pet+i# 9x] ~ 
Up—1[Pet+1, Pe+H#9e] 

The final string u, of the guided rewrite sequence is referred to as the 
yield of go for u, notation yield(@). Conversely, every specific reduction from u 
to v gives rise to a corresponding guided rewrite sequence for u. 

A guided rewrite sequence 0 = (gx, Pr);,—1 iS Said to be repetition-free all 
its guide-position pairs are different, i.e. for 1 < ki, ko <1, gk, = Gko \Pky = 
Pky implies ky = ko. 


Definition 3. Leta eh. A sequence st = (g;,q)ier of guide-offset pairs, 
for I CN a finite index set, is called a slice for a and G if it holds that 
OWGa€EG, t1l<a< #4, and (itt) a~ gq), for alli EI. The slice sl 
is called a slice for a string u € &* at positionn, 1<n< #u, if it is a slice 


of u[n]. 


A position p refers to the symbol u[p] of a string u. In contrast, in a guide- 
offset pair (g,q) of a slice sequence, the offset q is relative to the guide g. 
Since we require 1 < q < #g for such a pair, the symbol g[q] is well-defined. 
We will reserve the use of qg for offsets, indices within a guide, and the use 
of p for positions after which a rewrite may take place, i.e. for lengths of 
proper substrings of a given string. 

The goal of the notion of slice is to summarize the effect of a number 
of guided rewrites local to a specific position within a string. The symbol 
generated by the last rewrite that affected the position, i.e. the particular 
symbol of the last element of the slice sequence, is part of the overall 
outcome of the total rewrite. This symbol is called the yield of the slice. 
More precisely, if J 4 0, the yield of a slice sé for a symbol a is defined 
as yield(sl) = gina. |Gimax| Where imax = max(I). In case I = 0, we put 
yield(s€) = a. Occasionally we write a ~ sé, as for a slice sé for a symbol a 
it always holds that a ~ yield(s@). 
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A slice s€ = (9:,qi:)ier is said to be repetition-free if, for 1,22 € J, 
Ji, = Gin \ Gy = Gy implies i; = ig. If we have I = 0), the slice sé is called 
the empty slice. 


Next we consider sequences of slices, and investigate the relationship between 
slices on two consecutive positions in a guided rewrite sequence. 


Definition 4. A sequence o = (sly), is called a slice sequence for a 
string u if the following holds: 


e sl, is a slice for u at position n, forn=1,...,#u; 


e forn=1,...,#u-l, putting slr = (gi,di)ier and Slnti = (9), %)ies, 
there exists a monotone partial injection y,:I — J such that, for all 
7E€landjec J, 


(i) 1¢ dom({n) => G = #9: 
(i) wWi=i = G=GFAG+1=q 
(iit) JE rngn) => G=1 


e the slices st; and styu, say sli = (gi,di)ier and slyu = (9),q)jeu, 
satisfy qi =1, for alli eI, and qi = #9%, for all j € J, respectively. 


For the slices sf, and sé,41, the mapping y,: J > J is called the cut for sl, 
and sf,41. It witnesses that s?, and s@,41; match in the sense that a rewrite 
(i) may end at position n, (ii) may continue for its next offset at position n+1, 
and (ii) may start at position n+1. Note, for arbitrary pairs of slices the cut 
may not exist. In fact, the requirements of Definition 4 completely determine 
the cut between two slices. Since a cut y is an order-preserving bijection 
from dom(y) to rng(y), and dom(y) and rng(7) are finite, it follows that for 
two slices sé, sé’ the cut for sé and sé’ is unique. We write sf~> sf’. A slice 
sl = (gi,%)ier is called a start slice if gq; = 1 for all i € I. Similarly, sé is 
called an end slice if q; = #g; for alli € I. A start slice is generally, but not 
necessarily, associated with the first position of the string that is rewritten, 
an end slice with the last position. Note, a start slice as well as an end slice 
are allowed to be empty. The yield of the slice sequence o is the sequence of 
the yield of its slices, i.e. we define yield(a) = yield(st;)--+ yield(sly.). 


Example 1. Let ~ be the adjustment relation with equivalence classes 
{a,b}, {c,d}, {e, f} and let the set of guides G be given by G = { gi, g2, 93 } 
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In (Gis Ui)ielp 
stj| 2,4 2+ (1,1), 44 (91,1) 
slg| 2,3,4 | 2+ (91,2), 3+ (92,1), 44 (g1, 2) 
stz3| 1,3 1+ (93,1), 3+ (g2, 2) 
sl4| 3,5,6 | 3+ (92,3), 54 (1,1), 64 (gi, 1) 
(91,2) (91,2) 


sls| 5,6 ned 


Table 1: slice sequence of Example 1 


where g, = fb, go = ace and g3 = d. For the string u = ebcfa we consider the 


guided rewrite sequence QO = ( (93, 2), (91,9), (g2, 1), (91,9), (91, 3), (g1, 3) ys 
The associated reduction looks like 


ebcfa =>9,,2 ebdfa =>g,0 fbdfa 9.1 (1) 
facea =+9,,0 fbcea +g,,3 focfh 4,3 focfb 


Recording what happens at all of the five positions of the string u yields, for 
this example, the slice sequence o = (sln)?_, given in Table 1. The slice 
sequence is visualized in Figure 2. 

For the particular choice of ,...,15, the monotone partial injection 
Yn, 1 =1...4, maps every number to itself. It is easily checked that all 
requirements of a slice sequence hold. The ovals covering guide-offset pairs 
reflect the cuts as mappings between to adjacent slices. However, they also 
comprise complete guides across a varying number of slices. Note, st, is a 
start slice, sts is an end slice. We have for the slice sequence o = (sln)?_1 
that yield(o) = yield(s;)----- yield(sts) = focfb. Indeed, this coincides with 
the yield of the guided rewrite sequence o of (1). 


The rest of this section is devoted to proving that the above holds in general: 
Given a string and a set of guides, for every guided rewrite sequence there 
exists a slice sequence and for every slice sequence there exists a guided 
rewrite sequence. Moreover, the yield of the guided rewrite sequence and 
slice sequence are the same. 


Theorem 3. Let 0 = (gx, pr),_, be a guided rewrite sequence for a string u. 
Then there exists a slice sequence 0 = (sln)*", for u such that yield(c) = 
yield(Q). 


Proof. Induction on r. If 9 is the empty rewrite sequence, we take for o the 
slice sequence of n empty slices. Then we have yield(g) = u and yield(o) = u. 
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Figure 2: slice sequence of Example 1 


Suppose @ is non-empty. Let (uz);_9 be the sequence of strings induced 
by o. By induction hypothesis there exists a slice sequence o’ for the first 
r—1 steps of 9. Suppose u;y—1 + g,,p, Ur. The slice sequence o is obtained 
by extending the slices of o’ from position p,+1 to p-+#g, with the pairs 
(9r,—pr). Then, 


yield(o) = yield(o'[1, pr]) - orl, #or] - yield(o'[pr+H#or+1, #ul) 
Ur—1[1, Pr] + Gr + Ur—1[Pr+#9r +1, Furi] = up = yteld(o) 


Verification of o being a slice sequence for u requires transitivity of ~. 


In order to show the reverse of Theorem 3 we proceed in a number of stages. 
First we need to relate individual guide-offset pairs in neighboring slices. For 
this purpose we introduce the ordering < on so-called chunks. 


Definition 5. Let o = (sln)F be a slice sequence for u. Assume we have 
Sly = (Gnas Oni tet, forn =1,...,#u. Letyn: In > In4i be the cut for sly 
and slnii, lon < #u. Define X ={(gni,dni,t.n)|l<n< #u, te In} 
to be the set of chunks of o and define the ordering < on X by putting 
(9,9,1,n) <(9',q',0,n') iff 
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e either n' > n and there exist indexes 9,ho,...,ln!—n,Nn'—n such that 


— by, he EInse and ly <hy,O<k<n'-n 
— hy € dom(Ynte) and Ynte(he) = Cer, 0K k<n’-n 


— 9 =3 and hy_n, = 1 
e orn <n and there exist indexes €9,ho,...,ln—n’, hn—n' such that 


— ly, he © Int, and ly < he, O<k <n—7n' 
— € € dom(yn'+k) and Yn'4k(le) = Pegi, 0<k<n—-n' 


— ho =i' and ln_y =i 


Note, for indices €,, hz € Int, as above, we have ¢, < hz, so lx is the lower 
index, hy; is the higher index. In the above setting with n’ > n, we say 
that the sequence f9, ho, £1, h1, ..-, &n’—n, An’—n is leading from 7 € I, up to 
i’ € I. Likewise for the case where n’ < n. 

For example, for the slice sequence (s@;)/_, of Figure 2, to identify the 
guide belonging to the guide-offset pair (g2,1) of slice s¢2, the pair is more 
precisely represented by the chunk (g2,1,3,2), for the pair is associated 
with index 3 € I of slice sf). Since for the cuts yg : Ig > I3 and 73 : 
Iz — I4 we have y2(3) and y3(3) = 3, we have (g2,1,3,2) < (g2,2,3,3) =< 
(g2,3, 3,4) via the sequence 3,3, 3,3 connects (gz, 1) and (go, 2), and 3,3,3,3 
connecting (g2,2) and (g2,3). (Hence the combination of these sequences 
is 3,3,3,3,3,3 which connects (g2,1) and (g2,3) directly.) As no jumps 
from a low index ¢ to a high index h need to be taken, we also have 
(92,1,3,2) = (g2,2,3,3) = (g2,3,3,4). Thus (g2,1,3,2) = (g2,2,3,3) = 
(g2,3,3,4) }. In fact, { (g2,1,3,2), (go, 2,3,3), (g2,3,3,4) } is an equivalence 
class for Y corresponding to the guide gz (cf. Lemma 1). Differently, we 
have (gz, 1,3,2) < (gi, 2,6,5) relating gz to the fourth occurrence of g; via 
the sequence 3,3,3,3,3,5,5,5, for example. Since there is a jump here 
from £2 = 3 to hg = 5, we do not have (g2,1,3,2) = (g1,2,6,5). The 
ordering (g2,1,3,2) < (gi, 2,6,5) reflects that apparently the rewrite with 
this occurrence of g; is on top of part of the rewrite using go as guide. 


Given a slice sequence a, the ordering < on the chunks of o in 4 gives rise 
to a partial ordering on the set ¥ /= of equivalence classes of chunks. As we 
will argue, the equivalence classes correspond to guides and their ordering 
corresponds to the relative order in which the guides occur in a rewrite 
sequence @ having the same yield as the slice sequence o. 
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Lemma 1. (a) The relation < on & is reflexive and transitive. 


(b) The relation = on X& such thatx =y = «xX yAy Xz isan 
equivalence relation. 


(c) The ordering < on X/= induced by < on X by {x] X [y] — > Aa’ e 
[xz] ay’ € [y]: x’ < y’, makes X/= a partial order. 


Proof. We only prove part (a); parts (b) and (c) are straightforward. As to 
verify reflexivity of =, let (g,q,i,n) € ¥. Choose 0) =i and hg =i. Then 
lo,ho € In, £0 < ho, and, obviously, 9 = i and ho = 2. So, (g,q,1,n) =X 
(9,9,%,7). 

As to verify transitivity of <, assume (91, q1, 71,71) < (92, G2, 72, N2) and 
(g2, 92, 12,N2) < (93, 93,23,3). We check that (91,91, 71,71) < (93, 93,73, 23) 
for the case ng < ny < ng, leaving the other cases, which are similar or 
easier, to the reader. Pick €,,hy € In, +z, for 0 < k < ne—ni, meeting 
the first set of requirements of Definition 5, and pick 0h’. © Ing+j;, for 


gig 
0 <j < nea—n3 meeting the second set of requirements. Consider the 
sequence of indices f, ho, ..-, £0; hn, -n, Which is the initial part of the 


sequence from ig € In, up to 73 € Ing, viz. the first ny—ng out of nea—ng 
pairs of indices, except that as has been replaced by €9. We check that 
the second set of requirements of Definition 5 holds for this sequence, making 
it a sequence leading from 7; € In, up to 73 € Ip. It is straightforward to 
check that the requirements are being met, except for fp) < eee This 
follows from the fact that ) = 7, is related to hn —-n, = 12 by a sequence 
of indices respecting the ordering on the index sets [,,,4, or related by an 
order-preserving mapping Yn,+4, and hj,, », € In, is related to hj, ng = 12 
by a sequence of indices respecting the ordering on the index sets Jn,4; or 
related by an order-preserving mapping Yn3+; too, n1i—n3 < J < ne—N3z. 
Therefore, we have £9 ‘<’ Nng—n, = t2 = Laz—ny ‘“S’ hn,—n3- (A more precise 
and detailed statement can be proven by induction on ng—n}1, but is omitted 


here.) 


The next lemma describes the form of the equivalence class holding 
a chunk x = (g,q,1,n). Using the cuts, equivalent chunks can be 
found backwards up to position n—q+1 and forward up to position 
n—qt+#g. These chunks together, (g,1,in—g41,2— +1), .--, (9,4, in, 2), 
+5 (9, #9; in-qt#g,.2—9 + #9) span the guide g that is to be applied, in 
the rewrite sequence to be constructed. 
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Lemma 2. Let o = Gh be a slice sequence for a string u. Let X = 
{ (Gn. Inist,n) | 1 <n < #u, i € I, } be the set of chunks and choose 
ced, saye=(G,9,1,1); Pup =n G-- Then there-enist. ji € Tay oy 
J#g © IptHg such that [tT] ={(9,5,jp+s) |l<s< #g}. 


Proof. It holds that (g,q,i,n-1) = (9',d,v,n) iff g = o', q = d-l, 
and i = yt (i) where Yn-1 : In-1 2 In is the cut for sé,_1 and sé, 
while (g,q,1,n) = (9',¢/,7,n+1) iff g = g', q+l = qd, and y(t) = 7, 
where yn : In > Inzi is the cut for sl, and sn41. So, choose js = 
(Yetpsos ory )(G) for 1 < 8 <q, and Jp = (Tntsor+o7m (i) for 
qis< #g. 


We are now in a position to prove the reverse of Theorem 3. 


Theorem 4. Let o be a slice sequence for a string u. Then there exists a 
guided rewrite sequence o for u such that yield(@) = yield(c). 

Proof. Suppose o = (sn aoe Sbp= (GinsGin ver, for WH, nig Feu, and 
let Y¥ = { (9ni;Qni,1,n)|1<n< #u, 1 € I, } be the corresponding set of 
chunks. We proceed by induction on #4. Basis, #4 = 0: In this case every 
slice is empty and yield(a) = yield( st, )--- yield( sly, ) = ull] ----ul#u] = u 
and the empty guided rewrite sequence for u has also yield u. 

Induction step, #4 > 0: Clearly, 1 /= is finite and therefore we can 
choose, by Lemma 1, x € ¥ such that [a] is maximal in V/=. By Lemma 2 
we can assume [2] = {(g,5,is,p +s) |1<s < #g} for suitable p and indexes 
is € Ip4s, for s=1,...,#g. Note, by maximality of [2], the indexes i, must 
be the maximum of [,+.. In particular, yield(o )[p+s] = yield( s,s) = g|s], 
for s=1,...,#g. 


Now, consider the slice sequence o/ = (sf, ye where 
Ga forn=1,...,pand n= p+#g+1,...,#u 
7 (Gins Vin )i€In\fin—p} for n= pt+l,...,pt+#9 


So, the slice sequence a’ is obtained from the slice sequence o by leaving out 
the guide-offset pairs related to the particular occurrence of g. 

Let X’ be the set of chunks of o’. Then #4’< #2. By induction 
hypothesis we can find a guided rewrite sequence 0’ = (gj, Pp}, );—1 for u 
such that yield( 0’) = yield(o’). Define the guided rewrite sequence p = 
(Gi. Pe hy DY Oe =O De — P, for PS uty and Ge — Oy a = 
We have 0 < p < #u—#g and ulp+1,p+#g] ~g since slpii,..., 8lpt4g are 
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slices for u[p+1],...,u[p+g], respectively. So, @ is a well-defined guided 
rewrite sequence for uw. 

It holds that yield( 0’ ) +g,» yield( @) as @ extends o! with the pair (g, p). 
Therefore, 


yield(o')|[n| forn=1,...,p and n= p+#g+1,...,p+#g 
ld = 
veld o) ln] { g{n—p] forn = p+l,...,pt+#g 


From this it follows, for any index n, 1 <n < por p+#gt+l <n< #u, 
that yield( @)|n] = yield( o' )[n] = yield(o’ )[n] = yield(o )[n], and for any 
index n, p+l <n < p+#g, that yield(o)[n] = g[n—p] = yield(a)[n]. As 
#yield( 0) = #yield(o ) = #u, we obtain yield( eo) = yield(a), as was to 
be shown. 


For the slice sequence (s¢;)?_, of Figure 2 we have the following equivalence 
classes of chunks: 


G3 = { (93; 1, 1,3) } Go = { (g2, 1,3, 2), (go, 2,3, 3), (g2, 3,3, 4) } 
Gi = {(g452,1), (91; 2, 2,2) } G? = { (91,1, 5, 4), (91,2, 5,5) } 
G? = { (91, 1,4, he (91,2, 4,2) } G} = { (91, 1,6, 4), (91,2, 6,5) } 


Moreover, G3 = Gi < Go, Go X G? and Go = G? x Gi. A possible 
linearization is G3 =< G} =< Gox G3 x G} x G?. This corresponds to the 
rewrite sequence 


ebcfa + 9,,2 ebdfa +g, 0 fodfa 9.1 facea +9,,3 facfh +4,,3 facfb +g,,0 focfb 


Note that the yield focfb of this rewrite sequence is the same as the yield of 
the sequence (1) of Example 1. However, here the second rewrite with g; 
of (1) has been moved to the end now. This does not effect the end result as 
the particular rewrites do not overlap. 


6 Guided Rewriting Preserves Regularity 


Given a language L and a set of guides G, according to Definition 1, the 
language Lg is given as the set { v € &* | du € L: u=>* v}. Theorem 2 
formulated in Section 4, states that if D is regular than Lg is regular too. We 
will prove the theorem by constructing a non-deterministic finite automaton 
accepting Lg from a deterministic finite automaton accepting L. The proof 
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exploits the correspondence of rewrite sequences and slice sequences, as 
captured by Theorem 3 and Theorem 4. First we need an auxiliary result to 
assure finiteness of the automaton for Lg. 


Lemma 3. Let G be a finite set of guides. Let Z = { se | 
jae: sé repetition-free slice for a with respect toG }. Then Z is finite. 
Moreover, for every string u and every rewrite sequence o for u, there 
exists a slice sequence o for u consisting of slices from Z only such that 
yield(o) = yield(g). 


Proof. Recall, a slice s€ = (gi,qi)ier is repetition-free if, for i1,i2 € J, 
Gi. = Gio \ G, = Uy implies 74] = ig. Therefore, finiteness of Z is immediate: 
there are finitely many guide-offset pairs (g,q), hence finitely many repetition- 
free finite sequences of them. Thus, there are only finitely many repetition- 
free slices. 

Now, let o be a rewrite sequence for a string u. By Theorem 3 we 
can choose a slice sequence o’ such that yield(o’) = yield(@). Suppose 
a (sln)*, and sly = (Gin; din )ier, for'n = 1. fu. By Lemma 2 it 
follows that given a repeated guide-offset pair (g,q), say (9,9) = (Gin, Gn) 
and (g,q) = (9j,n;4,n) for indexes i < j in I;, we can delete the complete 
equivalence class of (9;,q;,4,) from slices slp—q41 to sln_g4#9, while re- 
taining a slice sequence. (In fact, we are removing the ‘lower’ occurrence 
of the guide g.) Moreover, the resulting slice sequence has the same yield 
as for all slices the topmost guide-offset pair remains untouched. The ex- 
istence of a repetition-free slice sequence o such that yield(c) = yield(a’), 
hence yield(a) = yield(g), then follows by induction on the number of 
repetitions. 


As a corollary we obtain that every rewrite sequence @ has a repetition-free 
equivalent o’. 
We are now prepared to prove that guided rewriting preserves regularity. 


Proof of Theorem 2. Without loss of generality « ¢ L. Let M = 
(©,Q,-,q0,F) be a DFA accepting L. We define the NFA M! = 
(=, Q', >’, qo, F’) as follows: Let gr be a fresh state. Put Q’ = QU(Q x Z)U 
{qr} with Z as given by Lemma 3, F” = {qr} and 


qo—'qo x if Cis a start slice 
gxCld x ifgSd,ane, yield(Q) =b, 6 ¢ 
q x Cs! ap ifdq':q%q' € F,a~, yield(C) = b, ¢ is an end slice 
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Note, by Lemma 3, Q’ is a finite set of states. The automaton M’ has 
only one final state, viz. gr. In the second type of transition, say with 
C = (gi,Gi)ier and C’ = (g',d))jes, the requirement ¢ ~> ¢’ implies the 
existence of a cut y: J > J in the sense of Definition 4. Thus in a way, the 
slice ¢’ is a follow-up of the slice ¢. 

Suppose v € Lg. Then there exist u = a,---a, € L, a rewrite sequence 
0 = (9k, Pk),—1 and strings uo, u1,...,Ur such that u = uo, Uk—1 > gx,p, Uk 
fork =1,...,r, and v = u,. By Theorem 3 there exists an slice sequence 
that is equivalent to 9. Therefore, by Lemma 3, we can assume that a slice 
sequence o for u exists with repetition-free slices and such that yield(o) = 
yield (0). Saya =-(s0y ej and sly = (GinsGin ver, for w= Lec vgs. Let 
qo —> m1 -:: “> qs € F be an accepting computation of M for u. Then 
qo 3! gosh ay ++ Qs-1X8ls Bay qr is an accepting computation of M’, 
where b, = yield(sl,), 1 <n <_s. Since we have b---b, = yield(st,) --- 
yteld(sl,) = yteld(c) = v, it follows that v € £L(M"). So, Lg C L(M’). 

Let v = bi ---b, be a string in £(M’). Given the definition of the 
transition relation on M’, we can find states qo,q1,.-.,Qs—1, repetition- 
free slices s€,,...5€, such that sl, ~» sé€j4, for n = 1,...,s—-1, anda 


computation go >! qoXsé1 oy ++ Gs_1X Shs ey qr. Thus, there exist a 
final state g, and a computation qo > qi---ds—-1 —> gs € F such that 
Qn ~ sf, forn =1,...,8, Le. sn is a slice for an. Put u = a,---as. Then 
ue L, (sly), is a slice sequence for u and yield(a) = v. By Theorem 4 
we can find a rewrite sequence g for u such that yield(e) = yield(o) = v. 
It follows that u=* v and v € Lg. Thus, £(M") C Lg. We conclude that 
Lg = £(M’) and regularity of Lg follows. 


As a soundness check, observe L C Lg the automaton M’ should accept any 
word a;...a, € L, s > 0. This can be verified as follows. Let ¢; be the empty 
slice yielding a;,i=1,...,s. Then a; ~ i, ie. aj = yield(¢;), which holds 
by definition. Moreover, ¢; is a start slice, ¢; ~ ¢C;41 for i = 1,...,s—1, 
and ¢, is an end slice. It follows that we can turn an accepting computation 


of M, say go > q > --- “> qs € F into an accepting computation of M’: 


Ey ay a2) Gs—1 7 as. 1 
G0 = Go< Gi SS ie Cg a i et x Ce dr: 


7 Insertion-Deletion Preserves Regularity 


This section provides the proof of Theorem 1, regularity of L implies regularity 
of Lj 4, exploiting Theorem 2, regularity of L implies regularity of Lg. For 
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the latter theorem to apply we need a preparatory transformation. The 
point is, in the setting of guided insertion/deletion of 3 strings are allowed 
to grow or shrink while guided insertions and deletions are being applied, 
whereas in the setting of guided rewriting of 4 the strings do not change 
length. 

The key idea of the transformation is that every group of 0’s is com- 
pressed to a single symbol. So, let us fix for the remainder of this section a 
regular language L over Xo = UU {0} and a set of guides G C NUN: UG -E. 
Let N be the maximum number of consecutive 0’s occurring in the elements 
of G. Then we introduce N + 2 fresh symbols 09, 01,...,0y,0, and put 
©: = {095.015 s+ «5 On 04: }- 


For a string wu over Xo we define the string & over the alphabet © = UU O. 
The string @ is obtained from u by replacing every maximal pattern 0° 
by the single symbol 0;, in case i < N, and by 04 in case i > N. More 
precisely, if a1,...,@, € © and u = 0%q@,0*1a9---a,0'" with k; > 0 for 
1=0,...,n, then U = 0p) 410), @2---@n0p, where pj = kj if 0 < kj < N and 
p= + if kj > N, fori =0,...,n. For such u = 0*a,0*!a9---a,0'" we 
write zeros(u,i) = k; fori =0,...,n. For a set V of strings over Uo, we put 


V={olveV}. 
Lemma 4. If L C ¥% is regular, then LC > is regular as well. 


Proof. Let M = (Q,™o0,6,q0, F) be an NFA accepting L. Obtain the NFA M’ 
from M by putting M’ = (Q, XU ©, 6’, qo, F) where 


O'(q,a) = 6(q,a) forae Xv {e} 
(9,0:)) = {a €Q|q+q} for0<i<n 
5(q04) = {¢ €Q\|H>Niq— 7} 


In particular, we have q te) q for all g € Q. We claim L = L(M’) n 
0.(h-0)*. 

Pick @ € L. Suppose u = 0*°a,0*...a,0%" € L with 0 < k; for 
1 = 0,...,n. Then we have U = 0,,410p,..-@n0p,, for suitable indices 
pi € {0,...,N,+ }. Let 


Oko » at OF 1 Gn Okn  , 
Go HOS CP 


be an accepting computation of M for u. Then 


Po y G1 Opry 


an Onn 
go —>'  — 1 ph ee gh — ey! Gn 7 hi EF 
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Figure 3: Example automaton construction as used in Lemma 4, with N = 2 


is an accepting computation of M’ for U. So wu € L(M’). Clearly, 7 € 
@.(x=-0)*. Thus ZC L(M’)N 0 -(x-0)*. 

Conversely, pick v € £(M’) NO -(X-O)*. Say v = 0p)410p, --+ anOp,, for 
some n > 0, po,---;Pn € {0,...,N,+} and aj,...,a, € U. Since v € L(M") 
there exists an accepting computation 


Oo YO ef Opry 


an Opn 
go —'  — 1 ph ee gh — Gn OY hii Ee F 


of M’ for v. Then, by construction of M’, there also exists an accepting 
computation 


Oko » at o*1 1 Qn Okn F 
= 4h 71 7° On > dn => An41 € 


of M for suitable indices ko,...,k, such that k; = p; if p; € {0,...,N }, 
and k; > N if pj =+, fori =0,...,n. Therefore, u = 0"°a,0™---a,0%" € 
L(M), i.e. u € L. Moreover, by the correspondence of ko,...,kn and 
D0,+++;Pn, respectively, it holds that U = v, hence v € L. Thus L(M’) nm 
@.(n-0)* CL. 

Conclusion: L = £(M’) 1 ©-(X-@)* and L is regular, being the inter- 
section of two regular languages. 


The construction from the proof is illustrated in Figure 3. 
We consider the adjustment relation ~g on © defined by 


angob = a=bV (a€OASDEO) 
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and guided rewriting = on © with respect to ~9 and the set of guides G. 
However, for elements of G the leading and trailing 09’s are removed, so 


G = {40g,a2--+ 0x, 14n | a0 aco sg, e G} 


Note, the correspondence of the index k; in 0%, and the index k; in O*F is 
literally, since always 0 < kj < N. Moreover, if, for u,v € %%, we have 
U ~o BD, then also m(u) = m(v). 


Lemma 5. Let u,v € Xp. 
(a) Ifusyjqv thent =v. 


(b) Conversely, if VU = U, and zeros(v,i) > N implies zeros(v,i) = 
zeros(u,7), for alli, then u = 4/4 V. 


Proof. (a) By definition, if u +;/q v with respect to G, then there exist 
strings 2, z € XO, y © U- hp-h, and g € G such that m(y) = a(g), u = ryz 
and v = xgz. Say 


x = 0*q)0" .--a,0*%*, y= 0,0%---0%-1b,, 
2= 0c 0 ts e,0, -and g = b\0% --- 0-16, 
for’ -s,.q) 2-0; 9 2 TO oes hey i pty Moe mg: Bd 
Gj, hay 0g, 01;+ 2.4 Dp Cig~-25 Cg © Us Then we have 


r= Onk 219 pk, ne asOpk, » Y= bi 0pe, ce One,_ Or ) 


Z = Opm C19 pm, *** CD pm, , and g= b10 pe ye One Or 


for suitable indices pk, pé;, pli, pm, €{0,...,N,+}. By definition of ~o 
we have One, ~0 Ope forl<j<r. Hence Y~o g andtU=ZYZ~opLGZ=wW. 


(b) Suppose 
U= 0*a,0™ tee a,o*" and v = 0%,0% ae b,,0o" 
for n,m > 0, ko,.--, kn, €0,---,4m > 0 and ay,...,@n,01,...,bm € U. Then 


uU= Onk 219 pk, re AnOok,, and 0 = One, 010 pe, Sas bmOpe,,, 
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for suitable pki, pl; € {0,...,N,+ },7=0,...,n, 7 =0,...,m. Assuming 
U = U with respect to ~9, we have n = m, a; = 0; for 1 <i < n. Moreover, 
there exist indices r and s, 1 <r <s <n such that 


Onk 19 pk, sa8 Ar—10 pk, = Ope5419 pe, sas Ar—10 pe, 
Ar Opk,.---Opk, 12s ~0 GrOpe,.--Ope,_, Gs 
Ar Ope, --- Ope, As EG 


Opk,@s+1---AnOpe, = Ope,ast1-.-AnOpe, 


It follows that pk; = pl; for 1 <7 < rand pk; = pe; fors <j <n. 
If pk; # + or pk, # + this implies kj = ¢; and kj = €;. If view of the 
additional assumption that zeros(v,i) > N implies zeros(v,7) = zeros(u, ?) 
for 1 <i <n, it follows that k; = ¢; for all 1 <i < rand k; = ¢; for all 
s<j <n. Now, put 


x = 0*q,0" 1 a0" ; y = a,0""...0*—14,, 


and z= O*s as44 » On Okn 


Choose g € G such that g = O0oa,Ope,..-Ope,_,as00. Say g = 
a,0 ...0%s-1a,. Since pé,,...,pl,_, # + it holds that £ = pé, = & 
forr <i<s,ie.g=a,0%...0%-1a,. Thus we have u = xyz, v = «gz and 
™(y) =7(g). Hence u +;/q v with respect to G. 


Lemma 6. It holds that 


Lia = {VEX | IJuEe Li: us* va 
Vi: (zeros(v,7) > N —> zeros(v,i) = zeros(u,z)) } 


where => is the guided rewriting relation with respect to G. 


Proof. (C). Let v € Lj/g. Thus, there exists u € L such that u iq U: 
Using the first claim of Lemma 5 we obtain u =* Uv. From u € LD we conclude 
u €L and therefore 0 € Lg. Furthermore, if zeros(v,i) > N then in u gv 
the corresponding group of zeros(v,7) many consecutive 0’s is not touched, 
so zeros(v,i) = zeros(u,7). This concludes (C). 

(D>). Let u € L satisfy a >" & and Vi: (zeros(v,7) > N — zeros(v,7) = 
zeros(u,7)), for n > 0. We will prove u = iq ¥ by induction on n. For the 
base case n = 0 this follows from u = v, the definition of the mapping — 
and the assumption (zeros(v,7) > N — zeros(v,7) = zeros(u,7)). For the 
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induction step, n > 0, suppose u >""! w => ¥. For every i, observe if 


zeros(w,z) > N then zeros(u,i) > N. Now choose w’ such that w’ = w 
and, for all i, if zeros(w,i) > N then zeros(w’,i) = zeros(u,i), otherwise 
zeros(w’,i) = zeros(w,i). Applying the induction hypothesis on 7 >"~! w’ 
yields u a w’, and applying Lemma 5 to 7’ > @ yields w' >j/¢ v, so 
U ear v. 


A direct consequence of Lemma 6 is the following corollary. 


Corollary 1. Jn the setting above, let v = 0"°a,0™ ---a,0'" € Xo, where 
a; €X fort=1,...,n. Then v € Lig uff V = Opyai 0p, +++ AnOp, € De and 
u = 0'a,0"---a,0%" € L exists such that ki = p; if p; € {0,...,N} and 
kj =m, if pj =+, fori =0,...,n. 


Now we are ready to construct, given an NFA M for the language L over Xo, 
an NFA Mj/q exactly accepting the language Lj/q. 

Suppose M = (Xo0,Q,—-,q0, Ff). According to Lemma 4 we have that 
L is regular. By Theorem 2 we obtain that Le is regular. So, let M = 
(%,Q, >, G, &) be an NFA accepting Lg. According to Lemma 6, Lj/q 
consists of strings v such that v € Ley thus for some u € L we have u >* v. 
By mapping v to 0 every maximal group of k 0’s with k > N is mapped to 
0,; the extra requirement for being in Lj/q is that the size of such a group 
coincides with the size of the corresponding group in the original string 
u € L. This leads to the following construction of the NFA Mj/q for Liva. 


Definition 6. Suppose M = (Xo,Q,—,q0,F) and M = (%,Q, >, %, F) 
are NFA’s for the languages L and La, respectively. Then, the NFA Mj q 1s 
defined as follows: 


e the set of states of Mjjq is Q xQx {0,...,N} 
e the transition relation of Mj/q is given by 


1. ifqg Sr andg > F then (q, 9, 9) = (r,7,0), foracd, q,rEeQ, 7,7FEQ 


2. ifg a, (zero or more 0-steps) and G o, & then (q, 9, 9) a (7,8; Os 
forar€Q,%7F€Q, ke {0,...,N}. More specifically, for k =0 we 
have a transition (q,9,0) > (r,7,0), for k = 1 we have a transition 
(q, 9; 0) 2(r, 7,0), and fork >1 we create k—1 fresh states and a path 
consisting of k 0-steps along these fresh states from (q,q,0) to (r,7,0). 
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3. ifq>r and GF then (q, 9,0) > (r,7,N), forqgr€eQ,%qr7reQ 


4. ifq>r then (q,7, N) = (r,7,N) and (q,7,1) Bue (r,7,7-1) for q,r € 
Q,7r€Q andi=1,...,N 


e the initial state of Misa is (qo, Go, 9) 


e the set of final states of Mjq is F x F x {0}. 


The idea of this NFA Mj/q is that as long as 04 does not come into play, it 
exactly mimics the NFA M executing an a-step, based on rule 1, or replacing 
every 0; by k separate 0-steps, based on rule 2. When it has to simulate 
a 04-step of M, it performs a number of 0-steps as following the NFA M. 
However, it has to be guaranteed that this latter number is indeed more 
than N. The third component of states of Mj/q serves this purpose. When 
M,/q takes a transition based on rule 3 above executing a 0-step, the third 
component is set to it maximal value N. Next, based on the two types of 
transitions covered by rule 4, either the value N is maintained to cater for a 
sequence of more than N-+1 zeros, or it is counted down to 0 taking exactly 
N steps, yielding N + 1 zeros at least. We illustrate the behaviour of Mj/q 
by an example. 


The language L = L( (1(00)*2)*) is accepted by the NFA M: 


1 0 
OSC 


Let G = {201}. Then we have N = 1, and L = £(09(1(09 + 04)200)*). For 
closure under G every substring of the shape 2001 of a string in L may be 
replaced by 20,1, yielding the following NFA M accepting Le: 
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The automaton M;j/q that is constructed as a product of the automata M 
and M is depicted in Figure 4. 

For example, the transition (qo,%,0)— (qo, 9,0) of Miya is based on 
the transition qa—>qo of M and the transition J—¥q, of M using rule 2, 
while the transition (qo, 4, 0)>(a1, J,0) of Mj q is based on the transition 
goon and Gq of M using rule 1. The two transitions leaving (q1, q, 0) 
in Mj/q correspond to the two transitions for g, in M, one labeled with 09 
and one labeled with 0,. The former induces the transition ¢-transition to 
state (q1, 93,0) based on rule 2, the latter however induces the 0-transition 
to state (q2, 73, 1) where the counter in the third component is set to N = 1. 
This indicates that at least one more zero needs to be matched. In Mj/q 
this can be done in two ways, either by looping via state (qi, G3, 1) or by 
going to state (q,@3,0) directly. Here, the transitions are based on rule 4 of 
Definition 6. Note that state (qo, 93,0) is a deadlock state. Finally, in state 
(go, G4, 0) two transitions are possible again. This reflects that at this point 
an insertion/deletion step may take place or not. If so, the computation 
continues via state (go,@5,0). If not, the computation proceeds with an 
é-transition to state (qo, 7,, 9). 


More concretely, consider the string 10000212 € L. It admits an 
insertion/deletion step with the guide 201 to the string 100002012. So, 
100002012 € Lj/q. For accepting 100002012 by M;/q it is essential to rely for 
processing the part 100002 on M, since with respect to © every group of more 
than one consecutive 0’s is compressed to 0,, by which in M the information 
is lost that we should have an even number of 0’s. This is handled in the 
(q2, 93, 1)-(q1, J3, 1) loop. For the rest of the string the automaton M should 
be followed, since in M the information that 21 was allowed to be replaced 
by 201 is not available. The 0-transition leaving of state (qo, @,4,0) makes 
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Figure 4: Automaton Mj/q 


this possible. The resulting accepting transition sequence in Mj/q reads: 


(Go, G0,0) + (go, G10) > (a1, G2, 0) “> (a2, 43,1) > (415 4,1) > 
(q2, G3, 1) “> (q1, %,0) > (qo, G10) * (40, 4,0) > 
(q1, 42,0) + (a1, 43,0) > (40, G1, 0) > (ao, M1, 0) 
Since (qo, %, 0) is a final state in Mj/q, this shows that 100002012 € L(Mj,q). 


For a formal proof that Ljjq = C(Mj/q) we need the following lemma. 


Lemma 7. Suppose § —> F for ,F EQ. Then q kr in M iff dq’ € Q: 
(q, 7, 9) Bu (7,7, .N) Py k-1 (r,7,0) in Mya, for allg,r€Q andk>N. 


Proof. Suppose q —.k» in M. Then there exists dr’ € Q such that 


0 0. RoN= 0 
Te ear rea 


0 
and since ¢ —> F one has (q,7, 0) = (q,7, N). Next we get 
Gan) ea GO 


in which in each of the last N steps the third argument decreases by 1. Since 
1+ (k—N-1)+ N =k, this proves the implication from right to left. 
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Conversely, suppose (q, g, 0) a (q’,7, N) —O. k-1 (r,7,0) Then, by 
0 
— 


z 
0 . 
qd =! rin M. 


definition of Mj /g, we then have q 
Now we are in a position to provide a proof of Theorem 1. 


Proof of Theorem 1. We show that, in the above setting, Ljjq = L£(Mi;a). 
Suppose v = 0a,0™!---a,0'" € Lijq where a; € © fori = 1,...,n and 
m, > 0 fori =0,...,n. Write 0 = 0,,a10p, +++ @n0p, with p; € {0,...,N,+} 
corresponding to m,, fori = 0,...m. Then by Corollary 1 we have 0 € Le and 
for some u = 0"°a,0*1 ---a,0*" € L we have that k; = p; if p; € {0,...,.N} 
and kj =m, and k; > N if pj = +, fori =0,...,n. Thus, u © LD is accepted 
by M andv € La is accepted by M. So, next to the initial states go of M and 
Go of M, there exist qi,...,@n; T0)---;Tn € Q and Gj,.-.-,Gn,) T0;---,7n €Q 
such that 


hy eee = Opp. ie : 
a —>* r; in M and q; —> fF; in M, for i=0,...,n 
rj-1 —> q in M and 7,_; “SG; inM, for i=1,...,n 
mm € F and?, €F 


We observe: (i) (qi, G;, 0) mi (ri,7:,0), for i = 0,...,n. In case p; € 
{0,...,N} this follows from m; = kj = p; and second transition type 
of Definition 6 for Mj /q. In case pj = + this follows from Lemma 7. 
(ii) (ri_1,7:-1,0) “S (q:,q;,0), for i = 1,...,n based on the first type of 
transition for Mj/g. Thus, v = 0a 0™ ---an,0" € L(Mjia). 

Conversely, pick v = 0™a,0™---a,0™ € L(Mj,/q). Then there are 
states @1,---;Gn) T0.---,Tn € Q and G],..-,Gn; Fo.-+-;7n € Q such that 
mn € F and fF, € F and (q;,q;,0) 2m; (Fe, 70) for 4 = 05.977, and 
(r= 410) ae (4, 9;,0) for i = 1,...,n, using that a;-steps in Mj/q are 
only possible from and to states having 0 as their third coordinate. From 
(Foe 45 0) a (qi, 9;,0) we conclude r;_; “4. gin M, and T1 —> Gi 
mM, for 7 = 1y4...4.0: 

Using £(M) =Lag C L(O-(Z-O)*), we may assume without loss of 
generality that every ¢ €Q only has either only outgoing O transitions and 
incoming » transitions in M, or vice versa. Here, the states g; are of the 
first type and the states 7; are of the second type. Moreover, without loss 
of generality, we may assume that for every two states 7,7 €Q there is at 
most one ©-transition from 7 to F in M. Now, by the form of the rules 
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725 ky 
in Definition 6, from (qi, G;, 0) _ (ri, 7:,0) we conclude q a. rj and 
On, 
q; —» 7 for p; € {0,...,N,+}, for some k; > 0, for i =0,...,n. So, 


oko ay an okn 
do == T0 —@ tt) OTn-1 — Gn => r,cC F 
= On = — = an = Opn = nl 
do — To—-Gm +: Tn-1 > On +Tn €F 


Therefore, u = 0%0q,0"---a,0% € Land v = Op9 4190p, +++ AnOp, © Le 
If p; € {0,...,N}, then by the assumptions on the transitions in M and 
uniqueness of © steps, we conclude m; = p;. If pj = +, then from Lemma 7 
we conclude that m; = k;. Since, by construction, from a state with its 
third coordinate of value N it takes at least N steps to get down at 0, we 
conclude k; > N. This holds for all i = 0,...,n. Therefore, by Corollary 1 
we conclude that v = 0'°a,0™! ---a,0'™ € Lig. 


8 Related Work and Concluding Remarks 


In this paper we discussed specific concepts of string rewriting: a more flexible 
notion focusing on insertions and deletions of a dummy symbol, another 
more strict notion based on an equivalence relation. Given a language L 
we considered the extended languages L;/q and Lg comprising the closure 
of L for the two types of guided rewriting with guides from a finite set G. 
In particular, as our main results we proved that these closures preserve 
regularity. For doing so we investigated the local effect of guided rewriting 
on two consecutive string positions, leading to a novel notion of a slice 
sequence. Finally, the theorem for adjustment-based rewriting was proved 
by an automaton construction exploiting a slice sequence characterization of 
guided rewriting. Via a compression scheme for strings of dummy symbols, 
the theorem for guided insertion/deletion followed. 

Preservation of regularity by closing a language with respect to a given 
notion of rewriting arises as a natural question. In Section 3 we observed 
that by closing the regular language L((ab)* ) under rewriting with respect 
to the single rewrite rule ba > ab the resulting language is not regular. So, 
by arbitrary string rewriting regularity is not necessarily preserved. A couple 
of specific rewrite formats have been proposed in the literature. In [10] it 
was proved that regularity is preserved by deleting string rewriting, where a 
string rewriting system is called deleting if there exists a partial ordering 
on its alphabet such that each letter in the right-hand side of a rule is less 
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than some letter in the corresponding left-hand side. In [13] it was proved 
that regularity is preserved by so-called period expanding or period reducing 
string rewriting. When translated to the setting of [21], as also touched 
upon in Section 3, our present notion of guided insertions and deletions 
allows for simultaneous insertion and deletion of the dummy symbol. A 
phenomenon also supported by biological findings. Remarkably, the more 
liberal guided insertion/deletion approach preserves regularity, whereas in 
the more restricted mechanism of [21], not mixing insertions and deletions 
per rewrite step, regularity is not preserved. 


Another crucial difference with the mechanism of [21] is the following: 
for that format it was shown that strings u,v of length n exist satisfying 
u =>* v, but the length of such a reduction is at least exponential in n. In our 
present format this is not the case: we expect that our slice characterization 
of guided rewriting serves to prove that if u =* v then there is always a 
corresponding reduction of length linear in n. Details have not yet been 
worked out. 


As mentioned in the introduction, the computational power of a variant 
of insertion-deletion systems was studied in [20]. There deletion means that 
a string uav is replaced by uv for a predefined finite set of triples u, a, v, 
while by insertion a string uv is replaced by wav for another predefined finite 
set of triples u,a,v. This notion of insertion-deletion is quite different from 
ours, and seems less related to biological RNA editing. In the same vein are 
the guided insertion/deletion systems of [4]. There a hierarchy of classes 
of insertion/deletion systems and related closure properties are studied. 
Additionally, a non-mixing insertion/deletion system that models part of 
the RNA-editing for kinetoplastids is given. A rather different application 
of term rewriting in the setting of RNA is reported in [8], where the rewrite 
engine of Maude is exploited to predict the occurrence of specific patterns 
in the spatial formation of RNA, with competitive precision compared to 
techniques that are more frequently used in bioinformatics. 


Possible future work includes the investigation of preservation of context- 
freedom and of lifting the bound on the number of consecutive 0’s in Theorem 
1. More specifically, for a context-free language L, does it hold, for a 
finite set of guides G, that Lg is context-free too? Considering the set 
of guides, a generalization to regular sets G is worthwhile studying. Note 
that the counter-example given in Section 4 involves a non-regular set of 
guides. So, if Z is regular and G is regular, do we have that Lg is regular? 
Similarly for L context-free. H.J. Hoogeboom suggested to us [11] to consider 
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cones of languages in the sense of Nivat [16], exploiting the closedness 
under finite state transductions. Shortly before the submission of the final 
version of this paper, along these lines a partial result restricting to guided 
rewriting only has been established by J. van Engelen [7]. Generalizing 
guided insertion/deletion, we also plan to consider guided rewriting based 
on other types of adjustment relations. In particular, rather than comparing 
strings symbol-by-symbol, one can consider two strings compatible if they 
map to the same string for a chosen string homomorphism. A prime example 
would be the erasing of the dummy 0 in the context of Section 3 for which 
we conjecture a variant of Theorem 2 to hold. 
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